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ABSTRACT 


Tower-based surveillance systems have been employed by the U.S. military to enhance 
intelligence, surveillance, and reconnaissance capabilities in Iraq and Afghanistan. We 
consider a scenario wherein two surveillance towers are installed in separate locations; 
however, the surveillance team does not have enough operators to operate both towers to 
their capacity. Two strategies can be used to operate these two towers: stationary 
allocation and dynamic allocation. We formulate a two-person nonzero-sum game to 
analyze these strategies, in which the surveillance team wants to maintain regional 


stability while insurgents carry out attacks to disrupt it. 


Our analysis suggests that the dynamic allocation strategy can improve the 
performance of surveillance towers over stationary allocation under most circumstances. 
The improvement tends to be more significant when the surveillance team has more 
surveillance resource. The dynamic allocation tends to be less effective when (1) a 
detected attack has a smaller negative impact on the insurgent operations, or when (2) a 


detected attack brings a larger immediate benefit to the surveillance team. 
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EXECUTIVE SUMMARY 


Dominant intelligence, surveillance, and reconnaissance capability is one of the key 
enablers in irregular warfare. This thesis is motivated by the deployment of Ground- 
Based Operational Surveillance System—a 24-hour all-weather tower-based surveillance 
system—to enhance situation awareness in Iraq and Afghanistan in the late 2000s. When 
the number of operators is not enough to staff all surveillance towers, we examined 
whether it is helpful to dynamically move the operators between them, in the hope that an 


understaffed surveillance tower can still deter insurgency activities. 


We considered the following scenario: two surveillance towers are installed in 
separate locations. However, the surveillance team does not have enough operators to 
operate both towers to their capacity. We compared two strategies: stationary allocation 
and dynamic allocation. With stationary allocation, the team splits up so that each tower 
is partially operational; with dynamic allocation, the team moves back and forth between 
the two towers at random intervals. We formulated a two-person nonzero-sum game, in 
which the surveillance team wants to maintain regional stability while the insurgents 


carry out attacks to disrupt it. 


Our analysis suggests that the dynamic allocation strategy can improve the 
performance of the surveillance towers over stationary allocation under most 
circumstances. The improvement tends to be more significant when the surveillance team 
has more surveillance resource. The dynamic allocation tends to be less effective when 
(1) a detected attack has a smaller negative impact on the insurgent operations, or when 
(2) a detected attack brings a larger immediate benefit to the surveillance team. These 
findings can provide suggestions for decision makers in allocating resources to enhance 


ISR capabilities on the battlefield. 


Xlll 


THIS PAGE INTENTIONALLY LEFT BLANK 


XIV 


ACKNOWLEDGMENTS 


First, I would like to express the deepest appreciation to my advisor Professor 
Kyle Lin, whose encouragement, guidance and support from the beginning to the final 
level enabled me to build up this thesis. Also, I would like to show my sincere gratitude 
to CDR Douglas Burton, whose professional opinions and guidance inspired me on the 


development of this work. 


Second, many thanks are due to my beloved family, especially my wife Chu-Mei. 
Her support and understanding allowed me to walk through many difficulties without 


worrying. 


Finally, I offer my best regards and blessings to all of those who helped me in any 


respect during my study here at the Naval Postgraduate School. 


XV 


THIS PAGE INTENTIONALLY LEFT BLANK 


XVi 


I. INTRODUCTION 


A. BACKGROUND 
L, Intelligence, Surveillance, and Reconnaissance 


In modern warfare, military forces seek not only to advance weapon systems, but 
also to enhance their capability in intelligence, surveillance, and reconnaissance (ISR). 
According to Joint Publication 1-02, ISR is an activity that synchronizes and integrates 
the planning and operation of sensors and assets, and processing, exploitation, and 
dissemination systems in direct support of current and future operations [1]. ISR is an 
integrated intelligence and operations function. For U.S. military operations in Iraq and 
Afghanistan, ISR capabilities are more critical than ever before, due to the nature of 
insurgent activities. Since insurgents can blend in easily with non-combatant citizens, an 
ability to dominate ISR on the battlefield is critical. A great deal of effort has been 
exerted to enhance ISR capabilities. For example, unmanned aerial vehicles and Ground- 
Based Operational Surveillance Systems (G-BOSS) have been deployed for some time 
with many documented success stories. Generally speaking, these systems provide 
surveillance and reconnaissance on the battlefield by collecting video and audio 
intelligence to enhance the commander's situational awareness on the battlefield. This 


thesis focuses on G-BOSS. 


2. Ground-Based Operational Surveillance System 


G-BOSS is a tower-based surveillance system derived from the sensor suite 
utilized on the Rapid Aerostat Initial Deployment, as shown in Figure 1. This system 
consists of four major assemblies: 


e Cameras, including one primary infrared camera (FLIR T-3000), and one 
electro-optical infrared camera (FLIR Star SAFIRE IIIFP). 


e One mobile tower (approximately 107 feet tall). 
e One Man-Portable Surveillance and Target Acquisition Radar (MSTAR). 
e One Ground Control Station (GCS). 


Currently, the United States Marine Corps (USMC) deploys G-BOSS to Iraq and 
Afghanistan to enhance ISR capabilities. The USMC awarded the initial $60 million G- 
BOSS contract to Raytheon on April 9, 2008. The goal is to use these surveillance 
systems to detect and disrupt insurgent activities. According to a news release from the 


Quantico Sentry [2], G-BOSS is being deployed in four phases. 


1. Phase One is deployment of G-BOSS to coalition outposts. During this phase, 
the system is operated manually at the base of each tower, with radio 
communications to the Combat Operation Center (COC) as shown in Figure 2. 


2. During Phase Two, G-BOSS is operated but the data/information is 
automatically fed to the COC. 


3. During Phase Three, G-BOSS is controlled from within the COC, with 
automatic slewing or rotating capabilities. During this phase, video storage 
capabilities are integrated. 


4. By Phase Four, the surveillance crew in charge of monitoring G-BOSS can 
track not only what is happening in their own region, but also that of the entire 
province through an integrated network. 


As this system is phased in, more monitors will be installed in COCs for 
consistent surveillance. It will thus be necessary to increase the number of operators to 
staff all systems in order to have better surveillance results. Increasing the number of 
operators, however, is usually a carefully considered constraint in combat situations. If it 
is not possible to increase the number of operators, the workload of current operators will 
then increase accordingly. According to Parasuraman and Mouloua [3], "the most 
significant factor that may influence the accuracy of monitoring under automation is task 
loads imposed on the operator." What can be done to make the best use of G-BOSS when 


facing a manpower constraint? 


This thesis explores the idea of assigning operators in a dynamic manner. Instead 
of assigning a single operator to one G-BOSS, the operator is moved among systems 
from time to time. Because the tower-based surveillance system is prominent in the areas 
where it is installed, the tower may still produce a deterrent effect for insurgents, even if 
it is not actively being monitored. With that idea in mind, one group of operators can be 
assigned to shift their attention back and forth between towers; the tower without 
operators will serve as a decoy. The objective in this thesis is to use mathematical models 

2 


to determine whether dynamic allocation of manpower can improve the performance of 
multiple systems, and whether a decoy tower can provide a deterrent effect with 


insurgents. 





Figure 1. Top of surveillance tower 
(From: FLIR http://www.gs.flir.com/datasheets/land.cfm) 





Figure 2. G-BOSS control room (From: Quantico Sentry) 


B. OBJECTIVE 


The goal of this thesis is to determine whether it is helpful to dynamically move 
manpower between surveillance towers when manpower is limited. The following 
scenario is considered: two surveillance towers are installed in two desired locations. 
However, there is only one surveillance team to operate one tower at its full capacity. It is 
possible either to move the surveillance team back and forth between the two towers, or 
to split the team so that each tower can be partially operational. The thesis develops 
mathematical models to study these two strategies. The findings in this thesis can provide 


suggestions for decision makers while employing surveillance systems on the battlefield. 


C; RELATED WORKS 


A significant amount of work has been done to improve the performance and 
effectiveness of surveillance systems in ISR and perimeter protection. The work can be 


divided into three categories. 


First, from the perspective of technology, the advances of cameras have had a 
significant influence on system performance, particularly the combination of a high- 
resolution charge-coupled device and electro-optics with an infrared sensor system [4]. 
With a longer surveillance range, higher image resolution, and information integration, a 
camera could remotely monitor an adversary’s activity day and night. These new 
surveillance technologies not only mitigate false detection rates, but also help reduce 


crew requirements. 


Second, there is a stream of work that uses mathematical modeling and 
optimization to improve surveillance results. Szechtman ef al. [5] used mathematical 
models to analyze optimal strategies for a moving surveillance sensor to detect infiltrators 
on a border. Midgette [6] proposed an agent-based simulation model to elevate the 
operational effectiveness of G-BOSS as guidance for system fielding. Also, William [7] 
carried out a surveillance and interdiction model with a game-theoretic approach to fight 


against vehicle-borne improvised explosive devices. 


Third, there are studies that compare the frequency of criminal activity before and 
after installation of surveillance monitors. Gill and Spriggs [8] summarized their research 
on the impact of using closed-circuit television (CCTV) in different cities throughout 
Great Britain thusly: 

The use of CCTV needs to be supported by a strategy outlining the 


objectives of the system and how these will be fulfilled. This needs to take 
account of local crime problems and prevention measures already in place. 


Welsh and Farrington [9] concluded their research about using CCTV in crime 
prevention as follows: 

Overall, it might be concluded that CCTV reduces crime to a small degree. 

In light of the successful results, future CCTV schemes should be 


carefully implemented in different settings and should employ high quality 
evaluation designs with long follow-up periods. 


Conclusions from previous research indicates that having appropriate surveillance 
equipment is a key enabler toward better detection results in a surveillance plan, which 


corresponds to the objective of this thesis. 


D. THESIS ORGANIZATION 


The rest of this thesis is organized as follows. Chapter II formulates a two-person 
nonzero-sum game to model the interaction between coalition forces and insurgents. 
Coalition forces assign manpower between two surveillance towers, while insurgents 
launch attacks in order to interrupt regional stability. In Chapter III, numerical analysis is 
carried out to demonstrate the model. Situations are identified in which it is helpful to 
dynamically allocate manpower between the two surveillance towers. Finally, Chapter IV 


presents findings and suggests future research directions. 
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U. METHODOLOGY 


A. MODEL 


Consider a situation in which Blue has established military bases in two towns. 
Blue’s goal is to maintain peace and eliminate insurgent activities in these two towns. 
(From now on insurgent activities will be referred to as attacks for brevity.) Blue has one 
surveillance tower set up in each base, but Blue cannot detect all attacks in both towns at 
all times, due to a lack of resources (manpower, equipment, etc.). Denote by s (s < 2) the 


total resources available to Blue, such that Blue can allocate detection probability p, to 


tower 7, as long as p,+ p, <s and O<p, <1, for i=1,2. The problem facing Blue is 


how to allocate s between the two surveillance towers. 

In each town, an insurgent group attempts to carry out attacks for its own gain. 
The insurgent group operating in town / is referred to as Red i, for i= 1,2. Red 1 and Red 
2 operate independently from each other. For each Red team, the status quo is not to 
attack, in which case neither Red nor Blue receives a reward or a penalty. If a Red team 
launches an attack, there are two possible outcomes: either the attack is detected by 
Blue’s surveillance tower, or it is not. The Red team earns reward of +1 for each 
undetected attack, and incurs a penalty r>0 (reward —r) for each detected attack. 
Because Blue’s goal is to maintain peace and ideally to eliminate attacks altogether, there 
is a penalty for each attack regardless of whether or not the attack is detected. However, 
detecting an attack is better than not detecting it, so Blue incurs a penalty 1 (reward —1) 


for an undetected attack and a smaller penalty b € (0,1) (reward —b) for a detected attack. 


Table 1 summarizes the reward for Blue and each Red team, respectively. 

We model the interaction between Blue and two Red teams as a nonzero-sum 
game, where Blue moves first, and then each Red team moves second, independently, 
after observing Blue's strategy. The objective of each player is to maximize his own long- 


run average reward. 


Nov attack: Attack Attack 
undetected | detected 





Table 1. Reward table 


If the detection probability is p in a town, Blue’s expected reward for each attack 


is 








(-DU— p)+(-b)p =-1+(1—-b)p, (1) 
and Red’s expected reward for each attack is 

(+DU— p)+(-r)p =1-(+r)p. (2) 
By setting Equation (2) to 0, we can solve 


If p> p, Equation (2) is negative, so it is optimal for a Red team to shut down its 
operation altogether. In the special case when p= p, Red’s expected reward for each 
attack is 0, so Red feels indifferent about whether to attack or not. For mathematical 
completeness, however, assume that Red will continue to attack if p = p, as it gives Blue 


a negative expected reward. 


Suppose each Red team can carry out attacks at a maximum rate x. Consider three 





cases for s: 
1. SE (2),2]. If Blue allocates p, = p, =s/2> p, then both Red teams will stop 
their operations. The long-run reward rate is 0 for all two players. 
2: SE [0, P| . No matter how Blue allocates s, both Red teams will continue to attack 
at the maximum rate x. The total long-run reward rate for both Red teams is 
x(—(+r)p,+1-(d+r)p,)=x(2-(+nr)s). (4) 
Blue’s long-run reward rate is 
x(-1+(1—b)p, -1+(.—)b)p,) = x(-2+(1—-))s). (5) 
3. sé(p,2p]. In this case, it is possible for Blue to allocate the detection 


probability such that it is optimal for one Red team to stop its operation. 
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The rest of this section focuses on the case when s €(p,2p]. In particular, it 


examines two strategies for Blue: stationary allocation and dynamic allocation. 


1. Stationary Allocation 


With a stationary allocation, Blue assigns p, to surveillance tower i, i= 1,2, ona 
permanent basis. It is reasonable to assume that each Red team will discover this 
allocation sooner or later, whether by intelligence or by computing its own success rate. 
Without loss of generality, assume p, 2 p, . It does not help to set p, < p , with which the 
optimal strategy for each Red team is to attack at the maximum rate x. If Blue 
sets p, = p+é, for some ¢ >0, then it is optimal for Red 1 to cease the operation, and 
for Red 2 to attack at rate x. 


Using Equation (2), Red 2's long-run reward rate is 
x0 -(+r)(s - p—-€)) =x(2-+n)(s+e)), (6) 
which converges to 
x(2—(1+r)s) (7) 
as EVO. 


Using Equation (1), Blue’s long-run reward rate is 








x(-1+(1-b)(s- p ay=a{ 1+d n(s L a} (8) 
l+r 


{ 1+ ns | )} (9) 
l+r 


which converges to 





as e v0. 


2. Dynamic Allocation 


With a dynamic allocation, Blue first assigns p to one tower and s— p to the 
other tower, and then swaps these allocations from time to time. Without loss of 


generality, assume p>s-— p. The idea of dynamic allocation is to make p> /p so that 


sometimes it is optimal for a Red team to pause attacks, but each Red team needs to guess 


when to resume attacks. The tower with detection probability s—p can be viewed as a 


decoy, which may provide a deterrence effect if a Red team does not know that detection 


probability has dropped from p to s—p. 


Blue has two decision variables p and y, such that Blue allocates detection 
probability p to one tower and s—p to the other, and swaps these allocations at a 
Poisson rate y. Assume that the battle goes on indefinitely, and that over time each Red 
team learns about Blue's choices of p and y, but does not discover Blue's real-time 
allocation. Because the two Red teams do not interact with each other, and because the 
parameters are identical in the two towns, from now on the analysis will focus on the 


interaction between Blue and one Red team, henceforth Red for brevity. 


One feasible strategy for Red is to attack at a Poisson rate x. Alternatively, Red 
can set aside some effort to learn about the real-time detection probability at a Poisson 
rate z. Red can do this by sending a spy, bribing Blue’s people, or probing the system in 
some way. We will impose a constraint that requires x + az <c, where a@>0 models the 
tradeoff between the attack rate x and the learning rate z, and c is the maximum attack 
rate if Red sets the learning rate to 0. With a learning rate z > 0, Red would learn about 
the detection probability at time moments that constitute a Poisson process with rate z. In 
other words, the time between two consecutive learning points follows an exponential 


distribution with rate z, independent of everything else. 


Recall that p>s—p. We say a surveillance tower is in state 1 if its detection 
probability is p, and in state 0 if its detection probability is s—p. In other words, each 
tower remains in state 1 for a random time that is exponentially distributed with mean 1/y, 
and then switches to state 0 and stays in state 0 for another random time, which is also 
exponentially distributed with mean 1/y, and so on. In the long run, each tower will be in 
each state 50% of the time. We say Red is in state | if Red is carrying out attacks at a 
Poisson rate x, and in state 0 if Red pauses its attacks. Red decides when it wants to 


move from one state to the other. 
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Because p> p, when Red learns the tower is in state 1, Red should pause its 


attacks. Let P(t) denote the probability that the tower will be in state k after ¢ time 


units if it is currently in state j, j,k =0,1. Using the result in Ross [10], we have 
1 1 —2 yt 
Fy Bae =1-P,(). 


Red can compute the probability of detection after ¢ time units once it learns that 


Blue is in state 1, if Red does not having a learning point in the next ¢ time units, as 


1 1 —2 yt ees —2 yt om 
Full) p+ Pelt) (s~ p)=(34+5¢ Je+(3 ao Jes Pp). 


Red should attack if this detection probability is less than p . After some algebra, 


we can show that Red should wait for another 


i oe) 
2p-s 


2y 





P= (10) 


time units before resuming attacks, if Red does not have another learning point in this 
time period. Consequently, Red’s optimal strategy takes the following form: whenever 
Red learns that Blue’s tower is in state 1, Red pauses its attacks until the next learning 
point or until ¢ time units have elapsed. If Red learns Blue’s state is 0 within the next 7 
time units, Red should resume attacks immediately; if Red does not have a learning point 
within the next f time units, then Red resumes attacks after ¢ time units. With this 
strategy, we can define a renewal reward process, where a renewal is a time moment 
when Red learns that Blue’s tower is in state 1. Figure 3 depicts this renewal reward 


process. 
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Red ——o-----o———_c------- O—_—\_©- -o——_ 
_——_ > <—_—— > 
t t 
Figure 3. Renewal reward process. 


This diagram depicts the renewal reward process if Blue dynamically allocates its resource. For Blue, 
each circle represents a switch point, with solid lines being in state 1 (detection probability p) and 
dashed lines being in state 0 (detection probability s-p). For Red, solid lines indicate state 1 
(attacking) and dashed lines indicate state 0 (not attacking). For the time line, each square represents 
a Red’s learning point, with a solid square being a renewal (Blue in state 1). 


Let T denote the cycle time in this renewal reward process. In addition, denote by 
T, the time until the next renewal if Blue’s current state is k, k =0,1. To compute E[T,], 
consider the next event. If Blue’s current state is 1, then the next event can either be 
Blue’s switch to state 0, or Red’s learning Blue’s state. Because the time to each event is 
exponentially distributed, the time to either event, whichever occurs first, is also 


exponentially distributed with a rate equal to the sum of the two individual rates y+ z. 
With probability y/(y+z), the next event is Blue’s switch to state 0, in which case the 
additional time until a renewal is distributed as 7). With probability z/(y +z), the next 


event is Red’s learning Blue’s state to be 1, which constitutes a renewal. Therefore, we 


can write 








With a similar argument, we can write 


Solving the preceding yields 


By definition, T and 7, have identical distributions, so 


Let X denote the number of detected attacks in a cycle, and Y the number of 
undetected attacks in a cycle. If Blue is in state k (k = 0, 1) and Red is in state 1 


(attacking), then let X, denote the number of detected attacks until the next renewal, and 


Y, the number of undetected attacks until the next renewal. 


To compute E|X,], consider whether Blue switches to state 0 first or Red learns 


Blue’s state first. The time until either event occurs follows an exponential distribution 


with rate y+z, so the expected number of detections during this time period is 
px/(y +z). Moreover, with probability y /( y+z), Blue will switch to state 0 first, in 
which case the additional number of detected attacks in the cycle is distributed as X,. 
With probability z/(y + z), Red will learn that Blue is in state 1 first, which constitutes a 


renewal. Therefore, we can write 





Solving from the preceding yields 


E[X,J==s, and E[X,] =F pte. 


In a similar way, we can set up two linear equations involving E[Y,] and E[Y,] 


as follows: 








xX 
E[Y,]= piesa Oa 
Solving from these two linear equations yields 
Xx Xx Xx 
E[Y,|= ae s),and E[Y,]= cae —p)) are s). 


Now we proceed to compute E[X] and E[Y]. Let Z denote the time of the first 
learning point after the renewal, which follows an exponential distribution with rate z. 
To compute EF [x | , condition on the event Z =r. If t<?, then at time ¢, either (1) the 
cycle ends if Blue is in state 1, or (2) Red resumes attacks (moves to state 1) if Blue is in 


state 0. If t>7, then Red resumes attacks at time f. Therefore, 
x]=[/ Po (tHE[X, kee “at +e" (P,(AE[X,]+ Po@)E[Xo)) 


i eee = ee ee = ae ee 
nite fe dtE| X,|+e ‘(Sege* JeLx]+e S-F * JELx] 


= s)( — ereeety. 





1 : 
=~ s(l+e%)-— 
22 2242 


where f is given in Equation (10). Similarly, 
y]={' Po(tE[Y, ke “dt +e“ (P, (AE[Y,]+ Po(A)E[Y) 


of 1 <x i 
=—— 2 =s)(l+e *)+==— -Qp_-si-e"),. 
sa ) ) 55435 pst ) 


Red’s long-run reward rate is equal to (renewal reward theory) 
E\Y E|X 
R(p, y, x, z) =(4+)) ipa 2iS | 
(11) 


E|T| E|T| 
= w{e-a +r)s\ite*)+(1+r)(2p—s) oj =<) | 
‘ gly 








Red’s decision variables are x and z, subject to x+@z<c. Blue’s long-run reward rate 


is 
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_pZE), 474 
Bp. 7.52)= (Der + Orn 





(12) 





~3{(-2+0-Dox-+e%)—0-By2p—s) < a=] 
: zt+2y 


with decision variables p andy. 


Remark 1. An important parameter to consider is the long-run proportion of time 
when Red is attacking. From the definition of the renewal process, at the beginning of 
each cycle Red will remain in state 0 until either the next learning point, or ¢, whichever 
occurs first. In other words, the amount of time Red is in state 0 in each cycle is 
min(W,/), where W follows an exponential distribution with rate z. In each cycle, the 


expected time that Red is not attacking (state 0) is 
: a i —zw as zw 1 -2t 
E[min(W,?)]= | w-ze~ dw +f t-ze “dw=—(l-e“). 
0 t Z 


Consequently, the long-run proportion of time Red is not attacking (state 0) is 


E{min(W,f)] _ 1 


EIT] ae ). (13) 


The long-run proportion of time Red is attacking (state 1) is 


s(t e*), (14) 


Remark 2. We assume that the two Red teams are operating independently, 
without any coordination. That is, when the Red team in one town learns the tower’s 
detection probability, it does not give this information to the Red team in the other town. 
In the case when the two Red teams maintain real-time communication, essentially the 


learning rate at each town is doubled and the same analysis applies. 
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B. RED’S OPTIMAL STRATEGY 


When Blue dynamically allocates its resources between the two surveillance 
towers, each player has two decision variables, as shown in Equations (11) and (12). In 
this model, Blue moves first and Red moves second, with each player trying to maximize 
his own long-run average reward. To compute this equilibrium, we first solve Red's 
optimization problem for given p and y. Although Red has two decision variables, at 
optimality the constraint x+@z<c must be equality, because R(p,y,x,z) strictly 
increases in x when z is held constant. Substituting x =c-— «az into Equation (11), Red's 


objective function involves a single variable z as follows: 








R(z)= c vi [( —(1+ r)s)(1+ e*)+ (1+r)(2p-s) : = (ie) 


By letting 


and using Equation (3) and (10) to get 


2 
——-s 
K, _ (1+r) i dae Oe 
K, 2p-s 2p-s 





we can simplify R(z) to 


Re) = | k (l+e*)+K, < (1-«*)| 




















4 zt+2y 
C—AZ _r yp ae, z ~(z4+2y)i 
= K,| ee" (1+e* J+ l-e 15 
4 : ( ) Zu ) me 
ag C-az eit sts z op 2y e (et2 yt 
mee z+2y z+2y 
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Proposition 3. The function R(z) is concave in z. 
Proof: We will show R"(z) <0 to complete the proof. To facilitate the computation, let 


C-az 


3 2 ; 
h(g)=e2" +—~— 4. 2 9 
z+2y z+2y 





so R"(z) =K,(g"(z)A(z) +29 '(z)h"(z) + g(z)h"(z)). 


For g(z), compute 
a W 
EM) aa 0: g"(z) =0. 


Taking the first derivative of h(z) yields 


a : : yf 
2 —(z+2y)i A 
“Ey ee) 
> eee _ eg ee -~0 
(z+ Dy) 


where the inequality follows by letting A=(z+2y)f>0 in the inequality 1+A<e*. In 


addition, 


4 es or ; : 


~ (z+2y)° 
4y ~(2t2y)F ie i)’ 
Gag er) 
< ia hg MgO Vad), 
Zrzy 


where the inequality follows by letting A=(z+2y)f>0 in the inequality 
A 2 
1+A+ a <e*. Consequently, R'"(z) <0, so R(z) is concave in z. : 
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Red’s objective is to choose z¢[0,c/a] to maximize R(z). Because R(z) is 


concave in z, to maximize R(z), first compute 





K rn K _2yt 
R'(0) = ret 2a ct)+ ri mC eo? ). (16) 


Consider two cases: 

1. R'(O)>0: In this case, it is optimal to set z*=0. 

2. R'(0)<0: Red wants to maximize R(z) for ze[0,c/a]. Because R(z) is 
concave and R'(c/a) <0, to maximize R(z), it is equivalent to solve R'(z)=0.A 


simple bisection algorithm is given below to compute z* such that R'(z*) =0. 


The constant 6 is the error bound on the solution. 
(a) Let a<O and b<-c/a. 
(b) Let m<(a+b)/2,and compute R'(m). 
(c) If R'(m) =0, then z*=m and exit. If R'(m) >0, then let ae m; if 

R'(m) <0, then let b<-m. 
(d) Ifb-—a> 0, go to (b); otherwise z*=a and exit. 
c. BLUE’S OPTIMAL STRATEGY 
Denote the optimal learning rate derived from the preceding algorithm by 


z(p, y), and let x (p.y) =c- az (p,y). Let 


B(p. y) = B(p. y.x*(p.y).2*(p.y)), (17) 
which Blue wishes to maximize by choosing p and y. To compute Blue's optimal strategy, 
we first plot B(p, y) and observe that the function is unimodal in each variable. We use 


the following algorithm to compute Blue's optimal strategy. 
1. Let i<-0, and p,< min(1,(s+p)/2). Use the golden section search to 


compute y, < arg max B(p,, y) 
; 


18 


2. Use the golden section search to compute p.,, < arg max B(p, Vids 
P 


i+l 
3. Use the golden section search to compute y,,, < argmax B(p,,,, y)- 

y 
4. If B(p..1,¥;.)-B(p,, y,) > 6, then let i<-i+1 and go to step 2. The parameter 
0 is the error bound. 
5. Output y = y,,, and p =p,,, as Blue's optimal strategy. 


We implement the preceding algorithm in Microsoft Excel using VBA. The end 
result is a decision aid that computes the optimal strategies for both players. For more 


details on the decision aid, see Appendix A. 
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Ht. NUMERICAL ANALYSIS 


The model presented in this thesis has five parameters, namely r,b, s, c, anda. 
The parameters r and b model the trade-off between a detected attack and an undetected 
attack for Red and Blue, respectively. The parameter s represents the total resources 
available to Blue. The parameter c represents the maximum effort Red can divide 
between attacking and spying on Blue tower status. Without loss of generality, we can set 
c =1, because using another value is equivalent to scaling the clock to a different time 
unit. Finally, the parameter @ models the trade-off between Red’s attack rate and its 


learning rate. 
Intuitively, a small @ implies that it is easy to learn about Blue’s tower status, so 


Red can set aside more effort to attack. However, the effect of learning depends not only 


on Red’s learning rate x but also Blue’s switch rate y. Because Blue can set y freely, it 
turns out that the parameter a@ does not have any effect on the optimal solution. 
Mathematically, rewrite Equation (11) as R(p,x,y,z,@) and Equation (12) as 
B(p,x, y,Z,@) to signify its dependence on @, and note that 

R(p, y,X, 2,0) = R(p, ay, x,a2z,1) 

B(p, y,X,2,@) = B(p, ay, x,az,1) 

In other words, if we treat az (instead of z) as Red’s decision variable and ay 
(instead of y) as Blue’s decision variable, then we convert the original problem to an 
equivalent problem with a@=1. Consequently, we can also set a =1 without loss of 
generality. 

From Blue’s standpoint, the optimal choice of y involves a delicate balance. If y 


is too small (say, once a year), then Red can easily take advantage of it by setting a 
moderate learning rate without much sacrifice to its attack rate. If y is too large (say, 


once an hour), then Red might as well give up learning altogether and attack at the 


maximum rate 1, which defeats Blue’s purpose of using decoy surveillance towers. In 
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other words, Blue’s choice of y needs to be large enough to keep Red honest, and small 


enough so that Red has an incentive to set aside some effort to spy on Blue’s operations. 


By setting c=a@=1, there are three parameters we need to consider. In Section 
A, we set r=4, b=0.5, and s =0.3 as our main example in order to demonstrate how to 
compute the optimal strategy. In Section B, we vary the parameter s in order to show 
how dynamic allocation can improve Blue’s performance beyond stationary allocation. 
Finally in Section C, we vary the parameters r and b in order to discuss some interesting 


observations. 


A. MAIN EXAMPLE 

This section demonstrates how to compute the optimal strategies of Blue and Red 
while using dynamic allocation. We consider a plausible scenario by setting 
r=4, b=0.5. Recall that the dynamic case applies when s € (Bp; 2p| , where p=0.2 and 
2p =0.4 according to Equation (3). We set s=0.3 to demonstrate computation of the 


optimal strategy. 


if Red's Optimal Strategy 

Recall from Equation (15), Red has one decision variable, namely the learning 
rate z. Red decides on the value z after finding out Blue’s detection probability p and 
switch rate y. For example, if Blue sets p=0.3 and y=0.01980, then Red can use 
Equation (15) to compute R(z). Figure 4 depicts the function R(z), which is concave in 
z as proved in Proposition 3. We then use bisection method to compute z* = 0.13237, 
and from Red’s constraint x + @z=c, x* =0.8696 can be solved as well. In other words, 


with the optimal strategy, on average, Red will learn about Blue tower status once every 


7.75 time units, and will attack once every 1.15 time units. 
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p = 0.3, y = 0.0198 






































Figure 4. Red's optimal strategy: Given Blue’s strategy is p =0.3, y =0.0198 , Red 
should use z* = 0.13237. 


rs Blue's Optimal Strategy 


Recall Blue’s objective function Bp, y)=B(p, y,x*(p,y),z*(p,y)) from 
Equation (17). For the main example when r=4, b=0.5, and s=0.3, we can plot 


B(p, y) , which is shown in Figure 5. 





r=4,b=0.5,s=0.3 











Figure 5. Blue's objective function B( DY) 
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Although it may be difficult to see, the function B(p, y) in Figure 5 is indeed 
unimodal in p andin y. Figure 6 shows the same function, when one of the variables is 
fixed. We use the golden section search method in each dimension iteratively to compute 
p*and y*, as discussed in Chapter II, Section B. In the main example, the algorithm 
produces B(p*, y*) = —0.43631 , when Blue set p*=0.3 and y*=0.0198 . In other 


words, the model suggests that Blue’s optimal strategy is to set one tower with detection 


probability p =0.3 the other with detection probability s— p =O and, on average, switch 


between towers every 50.5 time units. 





r=4,b=0.5,8=0.3 r=4,b=0.5,s=0.3 
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Figure 6. Bp, y*) and B(p*, y) are unimodal in p and y when 
r=4, b=0.5, s=0.3. 


Although we did not prove it mathematically, B(p,y) is unimodal in p and in y 
in all the numerical experiments we conducted. Another example when the optimal p 


does not lie on the boundary is shown in Figure 7. Consequently, our algorithm in 


Chapter II works well in computing the optimal solution. 
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r=1,b=0.1,8=09 r=1,b=0.1,s=0.9 
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Figure 7. B(p, y*) and B(p*, y) are unimodal in p and y when 


r=1, b=0.1, s=0.9. 





B. EFFECTIVENESS OF DYNAMIC ALLOCATION 


This section compares dynamic allocation with stationary allocation. Notice that 
when using dynamic allocation in two towns, the long-run reward rates derived from 
Equation (11) and (12) represent Red and Blue’s reward in one town, respectively. For a 
fair comparison with stationary allocation, we multiply these two numbers by two. All 


the numbers reported in the remainder of this chapter refer to the total reward in the two 
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Figure 8. Comparison between dynamic allocation and stationary allocation with 


r=4, b=0.5 


25 


In the main example, we set r=4, b=0.5, which yields p=0.2 and 2p =0.4, 


according to Equation (3). We consider three cases as in Chapter II, Section A. 


First, in the case s €[0, p], Blue’s long-run reward rate is Equation (5), and 
Red’s long-run reward rate is Equation (4), as derived in Chapter H, Section A. As shown 
in Figure 8, when se€[0, 0.2], Red’s long-run reward rate decreased linearly, and 


Blue’s long-run reward rate increases linearly, as Blue’s resources increased. 


Second, in the case se(p , 2p], Blue has two options for resource allocation, 
either stationary allocation or dynamic allocation. If Blue chooses a stationary allocation 
strategy, we can use Equation (9) to plot Blue’s long-run reward rate, and use Equation 
(7) to plot Red’s long-run reward rate. The results of dynamic allocation come from the 
algorithm described in Chapter II, Section B and C. With dynamic allocation strategy, we 
can use Equation (12) to plot Blue’s long-run reward rate, and use Equation (11) to plot 


Red’s long-run reward rate. 


Third, in the case of se(2p, 2], it is trivial for Blue to allocate 
P, =P» =s/2>p. It is optimal for both Red teams to stop their operations, so the 


payoffs of all players are zero. Therefore, this case is not shown in Figure 8. 


As shown in Figure 8, dynamic allocation is better than stationary allocation for 
Blue. In our main example, compared with the stationary allocation, the Blue’s long-run 
reward rate with dynamic allocation improves from 0.15% to 28.71% as s increased 


from 0.2 to 0.4. 


With Blue’s dynamic allocation strategy, however, Red’s long-run reward rate 
also increases. As Red can learn the tower’s status, Red will attack when attacks are less 
likely to be detected, and will pause when attacks are more likely to be detected. 
Although Red has a smaller attack rate x (because Red sets aside some effort on 
learning), its attacks become more effective. Consequently, Red’s performance also 
improves when Blue uses dynamic allocation. In fact, Red’s improvement as a percentage 
to that in stationary allocation is better than Blue’s. This observation will be examined 


again in Section C of this chapter. 
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Blue’s use of dynamic allocation also affects Red’s operations. In Figure 9, we 
plot the long-run proportion of time Red is attacking as derived in Equation (14). The 
proportion of time Red is attacking is higher than 50% (stationary allocation), and 
increases when Blue’s total resource s increases. The long-run attack rate, however, is 
computed by multiplying Equation (14) with the instantaneous attack rate x. As seen in 
Figure 10, Red’s long-run attack rate is less than 50% (stationary allocation), and 


decreases when s increases. 



















































































r=4,b=0.5 
0.530 
2 0.525 
re) 
= 0.520 
8 
6 
& 0.515 
© 
Qa 
5 0.510 
> 
c 
5 0.505 
0.500 
0.20 0.25 0.30 0.35 0.40 
Blue's Total Resource (s) 
Figure 9. Long-run proportion of time Red is attacking in dynamic allocation, 
compared with 0.5 in the case of stationary allocation. 
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Figure 10. Red long-run attack rate in dynamic allocation, compared with 0.5 in the 


case of stationary allocation. 
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C. DISCUSSIONS 


In this section, we vary parameters b and r to see how they affect the optimal 
strategy. First, we fix r=4, and compare three value of b=0.1, 0.5, 0.9, as shown in 


Figure 11. 


28 


Figure 11. 
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As shown in Figure 11, qualitatively the results look about the same when b 
changes. The dynamic allocation always provides some benefit over stationary allocation, 
and the improvement is more significant when s increases. Table 2 reports the reward 
rate of dynamic allocation as percentage over that of stationary allocation, for Blue and 
Red, respectively. It shows that, for both players, the improvement is more significant for 


a larger value of b. In other words, dynamic allocation is more effective for a larger s 


or for a larger b . Also seen in Table 2, Red’s improvement is larger than Blue’s. 









13.78% 





30.26% 57.56% 126.69% 














Blue 1.96% 5.72% 10.98% 18.40% 
Red 14.21% 31.63% 60.71% 134.47% 
Blue 3.13% 7.76% 13.61% 21.23% 
Red 14.61% 32.91% 63.67% 141.83% 























Table 2. Improvement in dynamic allocation as b increases 


Next, we repeat the experiment for r=1 and r=9, and plot the results in Figure 


12 and Figure 13, respectively. 
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Figure 12. Comparison between dynamic allocation and stationary allocation with 
r=1, b=0.1, 0.5, 0.9 
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Figure 13. 
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As shown in Figure 12, there are some cases where dynamic allocation does not 
provide additional benefits beyond those of stationary allocation. For instance, when 
r=1, b=0.1, the optimal solution to dynamic allocation coincides with stationary 
allocation, when s is between 0.5 and 0.8. We plot Red’s optimal strategy in this case in 


Figure 14. 
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Figure 14. Red’s optimal strategy when the optimal solution to dynamic allocation 
coincides with stationary allocation 


Qualitatively speaking, dynamic allocation tends to be less effective when r, b, 
and s are small. Below we offer some intuitive explanations. When r is small (close to 
0), Red is not very concerned with a detected attack, so Red has less incentive to invest in 
the learning rate, which makes dynamic allocation less effective. When b is large (close 
to 1), Blue does not care much between detecting an attack or not. Instead, for Blue it is 
important to reduce Red’s long-run attack rate, which can be accomplished by dynamic 
allocation. Finally, when s is small (close to p), Red’s expected payoff for each attack 
is only slightly less than 0, even if the detection probability is s. Therefore, Red has less 
incentive to find out Blue’s tower status, which makes dynamic allocation less effective. 


In summary, dynamic allocation tends to be more effective when r, b, and s are larger. 
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IV. CONCLUSIONS 


This thesis examined how to operate two surveillance towers most effectively 
with limited manpower (surveillance resources). In particular, a dynamic allocation 
strategy was studied, with which the surveillance team is moved between the two towers 
intermittently. Because it is difficult to tell from the outside whether a surveillance tower 
is fully functional, the understaffed tower can serve as a decoy to deter insurgent 
activities. The problem was formulated as a two-person nonzero-sum game between the 
insurgents and the government forces, with the latter moving first. After an algorithm is 
presented to compute the equilibrium in this game, this study’s findings were 


demonstrated numerically. 


Our analysis suggests that the dynamic allocation strategy can improve the 
performance of surveillance towers under most circumstances. The improvement tends to 
be more significant when government forces have more surveillance resources. Dynamic 
allocation tends to be less effective when (1) a detected attack has a smaller negative 
impact on insurgent operations, or when (2) a detected attack brings a larger immediate 
benefit to government forces. Our model applies not only to military operations but also 


to surveillance problems in general. 


There are some limitations to our model. First, it assumes that attacks follow a 
Poisson process. Second, it assumes that there is no cost to switch the resource between 
surveillance towers. This assumption may be reasonable when the videos from two 
towers are fed to a single control room, but not if there are two separate control rooms. 
Third, the model assumes that the detection probability increases linearly in the allocated 


resource. 


There are many possible future research directions. First, it may be worthwhile to 
study an asymmetric model with different parameters in the two towns. Second, 
extending analysis to more than two surveillance towers can give government forces 


more flexibility. Finally, this thesis studies Stackelberg equilibrium, in which government 


35 


forces move first and insurgents move second. It is important to study whether there are 
other types of equilibriums, especially an equilibrium that results from simultaneous 


moves. 
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APPENDIX 


A. DECISION AID 


This Excel file implements the algorithms described in Chapter II, and consists of 
six worksheets. The green cells require the user to enter input values. The yellow cells are 
computation results from VBA codes. The red cells contain formulas, which should not 


be modified by the user. Below we explain each worksheet one at a time. 


TL; Parameter 








4 A B Cc D E 


Input parameters 





Reward Table 


Attack 





~T 


Is dynamic allocation useful? 
(Recommandation for Resource Allocation) 





16 Input Available Resource (s) 





17 





Note: 


Input values, Can be modified 
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This worksheet allows a user to enter model parameters, namely r, b, a, c and s. 
The reward table will be formulated and p computed. By clicking on the button, the 
program will check the value of s and show the corresponding result as Figure 15. In 
addition, the default precision for computation is 1.0E-8, which can be changed accoring 


to the user’s desired calculation result. 


In this case, 0 <= s <= pHat, --> [0,pHat] 

No matter how Blue allocate resource, Red continuously attack both towns! 
Long-Run reward rate: 

Red = 1.5, Blue = -1.99 


_« | 





Allocation Result ; x| 


In this case, 2pHat < s <= 2, --> (2pHat,2] 
Red should STOP attacks in both towns! 
Long-run reward rate: 

Red= 0, Blue=0 







aay 
Allocation Result ; ; xi 


In this case, pHat < s <= 2PHat, --> (pHat,2PHat] 
Consider Stationary and Dynamic allocation in next step! 





_« | 





Figure 15. Allocation recommendations for different s 
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2. Blue 





















23. Allocation 








[44> ¥i] Parameter Red Andy  PndP | Blue, “ex Le) 


4 A 8 C D E F 6G H 1 1 K t M N 
1 pHat Initial Cut "p” s Optimal for Blue in Dynamic Allocation eration | Pp Y x z a" R" 
2 1 0.250000 0.006426 0.921573 0.078427 ~0.452082 0.313388 
3 2 0.266326 «= 0.008426 «818086 = 0.081934 0.489589 ——0,345017 
4 Optimal for Red In Dynamic Allocation 3 0.266326 0.010437 0.901380 0.098620 0.446068 0.329380 
I i. | = | we | oe) aed eeel see eel aaa ace 
6 5 0.288443 0.016476 0.878207 0.121793 0.439312 0.349545 
7 6 0.300000 0.016476 0.87476 0.124524 -0.437105 0,369399 
3 7 0.300000 0.019802 0.867629 0.132371 0.436312 0.359523 
9 8 0.300000 0.019802 0.867629 0.132371 -0,430312 0.359523 
10 

Golden Search 
1 (Dynamic) 
12 
13 
“4 
15 PS (Dynamic) 
16 
v7 
18 

pe pHat 

19 (Dynamic) 
20 
21 
22 stationary 


This worksheet will be activated if s €(p,2p]. In this worksheet, the user needs 


to enter a initial cut “p” (p €[p,s]) as a starting point for program to perform the golden 


section search, as explained in Chapter II, Section C. The optimal strategies for Blue and 


Red will then be reported. Also, the iteration results of the golden section search will be 


listed for reference on the right-hand side. This worksheet implements the algorithms in 


Chapter II, Section C, “Blue’s Optimal Strategy.” 
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4 A B c D E F G H 1 J K L 
1 pHat Pp s Number of Z z R'(z) R(z) Re-Plot | 
2 0.00000) 2.59505} 0.25000 
3 0.00667) 2.26323} 0.26617 
4 0.01333 1.97264] 0.28027 
5 0.02000 1.71774) 0.29255 
6 0.02667 1.49375] 0.30324) 
7 0.03333 1.29660] 0.31253 
8 0.04000 1.12275| 0.32058 
9 0.04667 0.96917} 0.32754 
p= 0.3, y = 0.0198 

14 Precision Red(p,y,x*,2") 0.35 
15 0.30 
16 0.25 
17 Blue| ne Ze 

(p,y,x*,2*) N 0.20 
18 
= 0.15 
20 0.10 
a 0.05 
22 Back to Parameter 
= 0.00 : = 
Be 0.0 0.2 0.4 0.6 0.8 1.0 
zo Zz. 
26 
27 | 0.16667| 0.14376] 0.35686 
28 | 0.17333| -0.16477| _ 0.35583 




















«> >| Parameter | Red_“FindY /FindP “Blue “Experiment 2) 


This worksheet plots Red’s objective function R(z) and computes Red’s optimal 
strategy, z* and x*, as explained in Chapter H, Section B. Blue’s strategy p and y are 


required inputs. This worksheet implements the algorithm in Chapter II, Section B, 


“Red’s Optimal Strategy.” 
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(0.033333) 0.151737) 0456344382] vn gazeeeesen 
0028573] 0146733) -0.449780833] 
002%] 0141002] used ~0.4a¢094468] 
0022222) 0.15734) 0.8621 044411687 = ‘* * 
r=4,b=0.5,s=0.3 
0.02] 0332789] 086721 0443189387) 
Go1si82| 0128754) 0.87125) -0.442922003] ¥ 
0.016667] 0325019) 0874 0443078016 
0.015385] 0.121562) 0.878e -0.443503628) 
at ae ise} —_2 ny 
Gs eee} aa es 
ae ef} eres 
— oo raul a oe 
oor 2s ay [usury 
2 0.009524] 2ucuir onteel 
2.009091] 0.099295] 0.90073] 
[23 | coos6ss] oos7sos| 0.9025] 
eee ee 
008] 00981961 [0452605526] 
26 0.09266 } ese1sH an 0453293432] 
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1. Plot DlueP 
































































By holding p constant, this worksheet plots B( P,y) as a function of y, and 
computes y that maximizes it. It implements the golden section search method. The 
value of p is a required input. This worksheet implements the algorithm in Chapter II, 


Section C, “Blue’s Optimal Strategy.” 
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2 0. 0.03076846] 0.9692355: 0.53453025} 
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By holding y constant, this worksheet plots B( Pp, y) as a function of p, and 


computes p that maximizes it. It implements the golden section search method. The 


value of y is a required input. This worksheet implements the algorithm in Chapter II, 


Section C, “Blue’s Optimal Strategy.” 
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Experiment 













































































































































































































































































































































7 n K u v w 
1 ELV] |Red long-run | The Proportion of 
x z EIT] 1EIT] fuincks time Red is not 
Dynamic | Stationary | Difference | Improvement | Dynamic | Stationary | Difference | Improvement tack vate attacking 
-2.00000| 2.00000] 2.090000] 2.009000] 

1.90000 1.360000] 

-1.99840] 1.320000] 

1.99760 1.880000] 

4.99600 1.840000 

1.99600 1.800000] 
A 0520] + 260000] 
10 
Fr 
2 
5 r=4,B-09 
u 
5 
16 
v7 
8 
2 

2 10 
% fig ALITY YVVON 
t+ Cyn 
2 P05 Lt OA a «Red Dynamic 
2 Go tHe, San 
+ 
= 3 00 “t++e)  +Red Stationary 
3 we 
2 fe 
= S05 o Blue Dynamic 
z ira 
28 5 0° 
$2.) 000 

23 S40 9 aaeog9ego00l? = Blue Stationary 0.45651] 0.49756 
30 al 0.45252) PRESS 
3 0.48837| 0.49082] 0.43833] 
2 15 0.48338] 0.48706] 0.45736] 
3 0.47802 0.48282] 0.43675} 
x 22.0 0.87233 0.47853] 0.49580 
35 0.08634] 0.47390) 0.42503} 
% 0.45005| 0.4805] 0.49833} 
37 25 0.45351) 0.48398) 0.49322] 
38 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.44570] 0.45870 0.48239] 
39 , é 0.43963 0.45320] 0.42155} 
40 Blue's Total Resource (Detection Probability) 0.43225] 0.44747] 0.45040] 
at 0.42486 o.s4tss] 0.45543] 
2 o.ai575| 0.83550 0.458 
a ERTS] PREY SERS NOSE TE) Te] Bae] come] — 0.40ssa 0.4268 0.45722 
44 095720 0.14761] 1.95%] 0.625182] 0.340000 0.285182 73.11% 0.02208] 0.39982 0.42200] 0.45636] 
45 0.98640) 0.16149 16.37% 0.590549} 0.320000} 0.270549] 54.55%) 0.02391) 0.39093) O.4i485) 0.48526] 
45 2.88560 0.17620] a7se%| o.ssesss| 0.250000] 0.275585 25.78% 0.07560] 0.38108 0.40728] 0.45832] 
7 0.3880] 0.19192] as.ass] 0.521058 0.240000 0.281058 a7.i%| o.0277al o.37sas| 0.39823] 
45 -0.98400] 0.20889) 21.23% 0.483663] 0.200000] 0.283663 | 141.5334] 0.02974} 0.36075} 0.39053} 
























































This worksheet shows --given r, b, a, c -- the comparison of dynamic and 
stationary allocations using different values of s. In addition to numbers, it shows a plot 


to compare the two strategies. 
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